#scraping google
Explore tagged Tumblr posts
artsietango · 2 years ago
Text
This Google Drive AI scraping bullshit actually makes me want to cry. My entire life is packed into Google Drive. All of my writing over the years, all of my academic documents, everything.
I’m just so overwhelmed with all the shit I’m going to have to move. I’m lucky to have Scrivener, but online data storage has been super important as I’ve had so many shitty computers, and the only reason I haven’t lost work is because Google Drive has been my backup storage unit.
My partner has recommended gitlab to move my files to - it seems useful, and I can try and explain more about what it is and how it works when I get more familiar with it. I’m unsure if it’s a text editor, or can work that way. He was explaining something about the version history that I don’t quite understand right now but might later. I’m just super overwhelmed and frustrated that this is the dystopia we live in right now.
29K notes · View notes
sendpseuds · 7 months ago
Text
Tumblr media
if you write fic and haven’t ditched google docs for ellipsus yet, you really should
1K notes · View notes
amalgamasreal · 2 years ago
Text
SOURCE
Bit of a long video but worth a watch.
TL;DW though is that hidden in the Terms and Conditions for Google's AI Labs is a nice little poison pill that says they get access to your entire Google Drive if you opt in.
So if you're an author of some type and you keep your unpublished works in your G-Drive that means an AI will get to scrape all of it and by opting in you will have given them permission to it. The content creator goes on to predict that Google is going to let out their own streaming service where the scripts, and potentially the art if it's animated, will be almost or entirely AI generated using that scraped data as a baseline and the authors/artist's who's work was essentially stolen in its most raw form to crib from will have zero way of fighting Google on that in our current legal system.
This is of course right in the middle of the writers and actors strike where we're seeing just what lengths studios will go to in order to screw everyone but themselves.
They go on to recommend that if you keep any creative or personal works on Google Drive that you pull it off as soon as possible and delete your entire Drive. They acknowledge that of course this doesn't mean Google really deleted the data but if you do it before they start compulsory opting everyone in there's a chance your work might get overlooked. They also recommend several free editing programs that aren't run by corporations like Google with LibreOffice (the default office program of most Linux distros) being named.
Finally they go over methods of shaming Google which I feel like you just have to watch for comedies sake so I won't describe them in full.
Now this is from me: I know the majority of people don't have the ability to build and manage a big archive just for themselves, but if you're a creative NOW IS THE TIME to educate yourself on what you can do to protect your works. Cloud storage was always iffy at best, but with AI scraping entering the mix it's now downright malignant. Get a bunch of thumb drives, buy some external hard drives, if you have the money buy a pre-built NAS, and if you really want to get into learn how to build your own NAS. These are the old ways before cloud and they're coming back again, more important than ever.
2K notes · View notes
fabaulti · 2 years ago
Text
I think most of us should take the whole ai scraping situation as a sign that we should maybe stop giving google/facebook/big corps all our data and look into alternatives that actually value your privacy.
i know this is easier said than done because everybody under the sun seems to use these services, but I promise you it’s not impossible. In fact, I made a list of a few alternatives to popular apps and services, alternatives that are privacy first, open source and don’t sell your data.
right off the bat I suggest you stop using gmail. it’s trash and not secure at all. google can read your emails. in fact, google has acces to all the data on your account and while what they do with it is already shady, I don’t even want to know what the whole ai situation is going to bring. a good alternative to a few google services is skiff. they provide a secure, e3ee mail service along with a workspace that can easily import google documents, a calendar and 10 gb free storage. i’ve been using it for a while and it’s great.
a good alternative to google drive is either koofr or filen. I use filen because everything you upload on there is end to end encrypted with zero knowledge. they offer 10 gb of free storage and really affordable lifetime plans.
google docs? i don’t know her. instead, try cryptpad. I don’t have the spoons to list all the great features of this service, you just have to believe me. nothing you write there will be used to train ai and you can share it just as easily. if skiff is too limited for you and you also need stuff like sheets or forms, cryptpad is here for you. the only downside i could think of is that they don’t have a mobile app, but the site works great in a browser too.
since there is no real alternative to youtube I recommend watching your little slime videos through a streaming frontend like freetube or new pipe. besides the fact that they remove ads, they also stop google from tracking what you watch. there is a bit of functionality loss with these services, but if you just want to watch videos privately they’re great.
if you’re looking for an alternative to google photos that is secure and end to end encrypted you might want to look into stingle, although in my experience filen’s photos tab works pretty well too.
oh, also, for the love of god, stop using whatsapp, facebook messenger or instagram for messaging. just stop. signal and telegram are literally here and they’re free. spread the word, educate your friends, ask them if they really want anyone to snoop around their private conversations.
regarding browser, you know the drill. throw google chrome/edge in the trash (they really basically spyware disguised as browsers) and download either librewolf or brave. mozilla can be a great secure option too, with a bit of tinkering.
if you wanna get a vpn (and I recommend you do) be wary that some of them are scammy. do your research, read their terms and conditions, familiarise yourself with their model. if you don’t wanna do that and are willing to trust my word, go with mullvad. they don’t keep any logs. it’s 5 euros a month with no different pricing plans or other bullshit.
lastly, whatever alternative you decide on, what matters most is that you don’t keep all your data in one place. don’t trust a service to take care of your emails, documents, photos and messages. store all these things in different, trustworthy (preferably open source) places. there is absolutely no reason google has to know everything about you.
do your own research as well, don’t just trust the first vpn service your favourite youtube gets sponsored by. don’t trust random tech blogs to tell you what the best cloud storage service is — they get good money for advertising one or the other. compare shit on your own or ask a tech savvy friend to help you. you’ve got this.
1K notes · View notes
vamprisms · 4 months ago
Text
"they had mpreg in middle-earth"
- J.R.R. Tolkien
77 notes · View notes
ochzarunoki · 10 months ago
Text
Tumblr media
The duo in their full color glory 😭✨
131 notes · View notes
draconesmundi · 11 months ago
Text
Happy Dracones Monday! The Vishap
Tumblr media
Found in the Armenian highlands, in Armenia, Azerbaijan, Iran and Turkey. Here we see one perched on a vishapakar stone ('dragon stone') in Armenia - these stones are often in the shape of fish or have a carving of some sort of animal sacrifice on them (often a bull), and sometimes they have a mix of fish and bull carvings.
This is just my interpretation of a vishap for Dracones Mundi - I chose to go a little more flamboyant with the design rather than make something that looked like a bull or a fish - especially as it's uncertain whether the carvings on the vishapakar are supposed to represent the dragons physically, or if they are more symbolic of summoning good luck for livestock, fishing or fertility.
I post new dragons for my project every Monday on this blog: @draconesmundi
103 notes · View notes
bmpmp3 · 3 months ago
Text
voicevox humming might also be good for those looking weird metallic noisy vocal synths if they're willing to play with the unpredictability of it because i had akashi on the absolutely wrong range setting for this song and he started breaking down like faulty motor
21 notes · View notes
monetmightexist · 7 days ago
Text
Been thinking about the ao3 scrape. Looked into it, and I feel its important to acknowledge, first, the fact that every website that was scraped has had their datasets either disabled (temporarily, though it's highly unlikely that nyuuzyou will win any of their cases) or deleted. AI programs on HuggingFace cannot be trained on any of the data that was scraped. This specific iteration of the problem is, from what I can tell, solved.
But it's still incredibly concerning.
Someone could just... do that. Steal millions of works, both writing and art, and then have the audacity to fight against the DMCAs they've recieved.
Now, as idiotic as it sounds, I don't plan on restricting my fics. I've had a good number of guests leave kudos and comments, and I respect their decision to do so anonymously.
As much as I'd hate to have my words reused in a generator, I have to remember that I have faults.
If my fics were fed into a robot, it wouldn't stop talking about the character's eyes and eyebrows. It would have those random typos I keep making in Fish in a Birdcage. If left unflitered, it would curse randomly and rather excessively. Would it know what to do with page breaks? Would it be able to learn my exact usage of italics, or would it just guess randomly, if at all? If it were trying to replicate my QSMP fics with other languages scattered throughout them, would it be able to recognize that and just start throwing in random spanish or french without reason? Or would it start making shit up, not having a translator built in because the laziest person alive didn't consider that because the fic was labeled as English?
There are a large number of chatfics on ao3. If every single piece of fanfiction was thrown into a robot, I wouldn't be surprised for a piece of narration to be randomly interrupted by a youtube comment esque diologue. And maybe the shorthand typing would end up in the normal narration, too. Even if a person filtered out tags to reduce faults, there are still so many untagged fics. Not to mention, AI fics being fed into an AI generator will fuck up so much shit.
Authors make formatting mistakes. Authors forget punctuation. Authors may learn some CCS code to throw into a fic that would be incredibly hard to interperet. Authors throw headcanons onto characters that may change their gender, appearance, etc. The best thing about fandom is that each person experiences it differently. Trying to mix all of these into one will, with enough work I suppose, create a product that some people find to be acceptable. But it will be so harrowingly inconsistent and confusing that no one could ever fully enjoy it.
Ao3 is, quite possibly, one of the most diverse websites out there. Which makes it also a horrible training ground for AI, which has the sole capability of being able to follow directions consistently.
Yes, your works have been stolen. Yes, my works will probably continue to get stolen. Yes, it will suck ass, and some lonely bitch will manage to make a few cheap bucks off of it.
But all that matters to me right now is that AI never has the life, the ideas, the experiences, nor the expression that a human does.
14 notes · View notes
heywriters · 1 year ago
Note
Do you know alternatives to Google Docs that include having the files online? I worry about my computer crashing and losing my local files. I looked into LibreOffice Online, but it doesn't seem like files get saved online.
I don't back up my files online, sorry, I use flashdrives and sometimes external harddrives. I'm sure someone out there has made a list of online resources they feel are trustworthy.
45 notes · View notes
slightly-awkward-sunshine · 11 months ago
Text
Well, Instagram in fucked. If you haven't heard, you can't "opt out" of Meta scraping your images for AI if you live in the US. (Also I highly doubt that they will honor the "opting out" policy.)
I'm struggling to get Glaze working on my macbook, so if anyone knows of other AI shading software for images lmk
In the meantime, I'll try posting some art on Cara -- I still don't want to post un-glazed images, but it's a start. I'll also probably start posting my work on here again (hi Tumblr!!) but it still really sucks because I know that most artists get their work from social media, namely Instagram, and I'd been working hard to grow my following :(
Just want to give all artists a big hug and a "keep your chin up kid" but I know it looks bleak rn :/
39 notes · View notes
midwestsolidarity · 4 months ago
Text
just made my ao3 wrapped, do not recommend it took me the whole day and told me that around half the fanfiction I read is marked explicit based on word count. I did not expect to be called out like this. Also according to my calculations I’ve read 23 million words or the equivalent of ~250 novels in fanfic this year which is perchance too much to be healthy 💀💀
10 notes · View notes
anti-rop · 3 months ago
Note
how can you champion free speech and then celebrate when millions of voices on tiktok are censored. hypocrite.
I didn’t want to talk about politics on this blog but, oh well, here we go. Response under the cut.
Let me preface this: I’ve never been a fan of TikTok and when talk of a ban first started to come onto the scene 6 years ago, I thought it was a good thing, for a multitude of reasons but I won't go into all of it. I'll focus on what the proposed ban and SCOTUS corresponded to. This is a topic of US national security and the type of precedents it sets for foreign companies operating in the US. I thought it would be good to act now [2019] rather than later [2025] because looking at the growth curve, it was a service that would easily become so popular that lawmakers would find themselves in an impossible position and a ban would never happen. 
Unfortunately, that’s exactly what’s happened. Again, in my opinion, now a horrible precedent exists. To any foreign government out there, the message is that you are allowed to enter US markets under any pretense, with zero reciprocity for US companies, and as long as you are popular and influential enough the US government and population will go out of its way to facilitate your access
If we are going to go to such extraordinary lengths for a foreign company and government the US must make a demand of absolute reciprocity, in my opinion. Meta, X, Google, Snapchat, and other US-based technology companies must be allowed total market access in China immediately with zero control by the Chinese Government (because that is what they have done through ByteDance owning Tiktok). When the Chinese government inevitably laughs at this demand, ask yourself why. They correctly see Meta, X, Facebook, and Google as instruments of US soft power and as cultural contamination of their civic ideal which undermines their hold on power.
However, we seem to naively believe we're immune from the same influence and have waited so long to act now that we face terrible choices. The one we've made inevitably means we will have a natural experiment now of what it means to allow a government that actively seeks to undermine our civic institutions with the most powerful known technological tool to do so. And the fact that the CCP and ByteDance decided to “shut it down” rather than divest it tells us everything we need to know. No free enterprise would willingly shut off access to 170 million users. 
Also, we should be concerned that millions of Americans acted like drug addicts going through withdrawal when they couldn't access a social media app for roughly 12 hours. That is also cause for great concern. But that's a conversation for another day.
6 notes · View notes
melyzard · 2 years ago
Text
Time for a new edition of my ongoing vendetta against Google fuckery!
Hey friends, did you know that Google is now using Google docs to train it's AI, whether you like it or not? (link goes to: zdnet.com, July 5, 2023). Oh and on Monday, Google updated it's privacy policy to say that it can train it's two AI (Bard and Cloud AI) on any data it scrapes from it's users, period. (link goes to: The Verge, 5 July 2023). Here is Digital Trends also mentioning this new policy change (link goes to: Digital Trends, 5 July 2023). There are a lot more, these are just the most succinct articles that might explain what's happening.
FURTHER REASONS GOOGLE AND GOOGLE CHROME SUCK TODAY:
Stop using Google Analytics, warns Sweden’s privacy watchdog, as it issues over $1M in fines (link goes to: TechCrunch, 3 July 2023) [TLDR: google got caught exporting european users' data to the US to be 'processed' by 'US government surveillance,' which is HELLA ILLEGAL. I'm not going into the Five Eyes, Fourteen Eyes, etc agreements, but you should read up on those to understand why the 'US government surveillance' people might ask Google to do this for countries that are not apart of the various Eyes agreements - and before anyone jumps in with "the US sucks!" YES but they are 100% not the only government buying foreign citizens' data, this is just the one the Swedes caught. Today.]
PwC Australia ties Google to tax leak scandal (link goes to: Reuters, 5 July 2023). [TLDR: a Russian accounting firm slipped Google "confidential information about the start date of a new tax law leaked from Australian government tax briefings." Gosh, why would Google want to spy on governments about tax laws? Can't think of any reason they would want to be able to clean house/change policy/update their user agreement to get around new restrictions before those restrictions or fines hit. Can you?
SO - here is a very detailed list of browsers, updated on 28 June, 2023 on slant.com, that are NOT based on Google Chrome (note: any browser that says 'Chromium-based' is just Google wearing a party mask. It means that Google AND that other party has access to all your data). This is an excellent list that shows pros and cons for each browser, including who the creator is and what kinds of policies they have (for example, one con for Pale Moon is that the creator doesn't like and thinks all websites should be hostile to Tor).
101 notes · View notes
thai-drama-ao3-stats · 10 months ago
Text
Thai Drama Stats Special Edition:
The Great Archive Lockdown 🔒
Hi folks! In case you weren't aware, there are various scraping bots that trawl through AO3 and use the data for AI training, content mill sites, or other vaguely nefarious purposes. One site, "Fanfic Books", is essentially creating an unauthorized mirror of AO3. Here are some posts about it.
To combat this, many users have recently chosen to "Archive Lock" their fics.
What is Archive Locking?
An archive locked work, or "restricted" work, is only visible to users who are logged into AO3. This prevents anonymous users (and bots that aren't using login credentials) from reading your fic or finding your fic in searches. This doesn't block all scraping bots, but it should keep most of them out.
What does this have to do with fandom stats?
The AO3 scraping I do doesn't use login credentials, so I can't count archive locked fics. That's totally fine! I am in no way telling you to stop archive locking! Lock or unlock to your hearts content!
It does, however, mean that the data I pulled from my Thai Drama AO3 Trends Dashboard this week (July 1 - July 7, 2024) are looking especially strange.
Tumblr media
Holy moly! We actually have negative growth. More fics were locked than posted, which is why the Net New is negative. I'd estimate that about 1% of all previously public Thai Drama fics were archive locked this week.
This matches trends on all of AO3. This week, the total number of publicly available fics actually decreased by 0.7% -- and that's including all the new fics being posted!
When did this happen?
The timing for both Thai Drama fandom and all of AO3 is pretty consistent.
Tumblr media
For Thai Drama fandom, most of the locking happened on Friday, July 5, but there was also some locking on Sunday.
Tumblr media
When we look at all of AO3, it seems like most of the mass-locking happened on July 5th as well, with additional locking happening all throughout the weekend.
Which Thai Drama fandoms were most affected?
Tumblr media
When we look at sheer numbers, KinnPorsche, of course, has a lot of newly-locked fics. 3 Will Be Free, My Engineer, and Dark Blue Kiss were locked down a lot as well.
Tumblr media
When we look at the top fandoms by negative growth, The Player saw almost all of its fics vanish overnight. 3 Will Be Free was cut neatly in half.
This data is cool I guess, but... so what?
If these numbers are accurate, it represents another sudden and massive shift towards archive locking on AO3.
According to @star-grazing's stats about archive locking in December 2022, the total number of archive locked works on AO3 increased by 70% in just a couple weeks after a reddit post went viral about AI bots scraping AO3 for machine learning material.
Those stats show that in December 2022, 5.79% of AO3 fics were archive locked. When I checked the numbers again today, 9.37% of all works were archive locked.
Using rough estimates, from the last few days of AO3 data, I'd say that the total number of archive locked works increased by 8% since last Thursday (7/4). And trends seem to indicate that the great lockdown is still going!
Anyway...
Thanks for sticking with me! This is a really fun time to be collecting AO3 stats :) If you have more questions, feel free to reach out. I also put some more details under the cut! Thanks y'all!
Are we sure it was archive locking, and not some other data issue?
Er, good question. It's my best guess, and I've tried to rule out other potentially culprits. The AO3 Fandom Trend Analysis Dashboard, which has data about all fandoms on AO3, doesn't seem to show anything amiss. Their data uses login credentials, meaning they can count archive locked fics.
I also went through several tags manually while logged in and logged out to compare numbers from this week to previous weeks. It doesn't seem like there was a mass deletion or retag that I could see.
I also used the "restricted:true" search operator to search for archive locked fics while logged in. A lot of those missing fics pop back up!
I absolutely welcome other theories though, if you think of one!
Is this still happening?
Seemingly yes, for Thai Dramas at least! When I checked the "All Thai Dramas" AO3 search this morning, the total number of Thai Drama fics had dropped below the 40K mark - lower than when I first started keeping track a month ago!
Tumblr media
We probably have a lot more archive locking in our future!
How do I archive lock my own fics?
There's a really good tutorial over here.
Help! I don't have an AO3 account, so I can't read all these archive locked fics anymore.
Please message me! I have some spare invites.
Which fandoms are the most "locked down"?
I'm not sure, but there is a Fanlore article about Hockey RPF and the Fourth Wall which provides some comparison stats. Hockey fandom has traditionally been one of the most locked down fandoms; less than half of hockey rpf fics are publicly available.
You can also peruse this AO3 search to see all archive locked fics.
15 notes · View notes
pybun · 1 year ago
Text
whats the safest place on the internet where i can write documents and no Al will scrape it?
25 notes · View notes